\documentclass{article}
\usepackage{hyperref}

\setlength{\parindent}{0pt}
\setlength{\parskip}{1em}
\addtolength{\hoffset}{-1in}
\addtolength{\textwidth}{1.5in}
\addtolength{\voffset}{-0.5in}
\addtolength{\textheight}{1.25in}

\begin{document}

\title{Computer programs for simulation-based estimation of peer effects}
\author{Brian Krauth \\
	Simon Fraser University}
\date{Version 1.2 - (March 2020)}
	
\maketitle

This file is the documentation for a set of computer programs
I have written to implement the structural estimation method
described in my paper ``Simulation-based estimation of peer effects'' \cite{smle}.

\section{Installation}

Version 1.2 is available at 
\url{https://github.com/bvkrauth/smle/releases/tag/v1.2}

Future updates will be available at \url{https://github.com/bvkrauth/smle}.

\subsection{Windows Installation- Stata (NEW)}

If you are running Windows and have Stata, you can install and 
use the Stata package {\tt smle} by running the command:
\begin{verbatim}
. net install smle, from("http://www.sfu.ca/~bkrauth/code")
\end{verbatim}
Once the package is installed, you can access {\tt help smle}
and ignore the rest of this document.

\subsection{Windows Installation - binaries}

If you do not have Stata, you can install and run the binary files directly. 
There are two estimation programs:
\begin{itemize}
	\item {\tt smle.exe:} Estimates structural model from an individual-based sample.
	\item {\tt s2.exe:} Estimates structural model from a group-based sample.
\end{itemize}
To install the programs, simply copy them into a directory of your choice.  
Both programs are self-contained, and can be copied to and run
from any directory.

\subsection{Installation in other operating systems}

If you have a modern Fortran compiler (i.e., one that can compile Fortran 90/95), 
the source code can be compiled directly.  This allows for use under Linux and other 
operating systems, and allows for the modification of code when needed.  
See Appendix~\ref{sec:compiling} for details.


\section{Basic use}

Both {\tt smle} and {\tt s2} run from the command prompt and perform the following
steps:
\begin{enumerate}
	\item Read in two text files:
		\begin{itemize}
			\item A {\bf parameter file}, i.e., a text file describing
				your preferred settings for the various user options.  
			\item A {\bf data file}, i.e., a text file containing the 
				data set from which you are estimating the model.
	\end{itemize}
	\item Perform calculations, logging the process in a {\bf log file}.
	\item Write the results to a {\bf results file}.
\end{enumerate}
To use any of the programs, you first need to create the two input files.
The format of the parameter and data files varies by program.
Details on how to construct these files can be found in Sections \ref{sec:smle} through \ref{sec:psim}.

Once you have created the data and parameter files, place the program you wish
to use in the same directory, open a Windows command prompt in that directory
and execute the command \\
\begin{center}
{\tt program-name parameter-file-name}
\end{center}
where
{\tt program-name} is the name of the program (either {\tt smle}, {\tt s2},
{\tt probit}, or {\tt psim}) and {\tt parameter-file-name} is the name\footnote{Note that
in previous versions, the programs did not take any arguments, and the parameter file 
had to be named {\tt parm.dat}.} of the parameter file you have created.

The program will run for anywhere from a few minutes to a few weeks,
depending on the size of your dataset and the options selected.
While running it will create several output files, including 
a log file and a file of results.

{\bf Example:} Suppose you have installed the program into
the directory {\tt c:/my-directory/}.  Open the Windows
command prompt and execute the following sequence of 
commands:
\begin{verbatim}
cd \my-directory\examples\smle
copy \my-directory\windows-binaries\smle.exe .
smle example1.txt
\end{verbatim}
This will run the {\tt smle} program with the example parameter file
{\tt example1.txt}.  It will run for a few minutes and 
create output files {\tt ex1\_log.txt} and {\tt ex1\_results.txt}.
The contents of both the parameter file and the output files
will be discussed in Section \ref{sec:smle}.


\section{Using the {\tt smle} program }\label{sec:smle}

The {\tt smle.exe} program is used to estimate the structural model 
described in the paper from an individual-based sample.  

\subsection{Data file}

The program expects data in whitespace-delimited ASCII format, with
no headers.  Each row should contain a single observation,
consisting of the following columns:
\begin{center}
\begin{tabular}{|c|c|c|c|c|c p{0in}|}
\hline
{\footnotesize {\bf Column \# }} & {\footnotesize 1} & {\footnotesize 2} & {\footnotesize 3} & \multicolumn{2}{c}{\footnotesize 4+}  & \\ 
\hline
{\footnotesize {\bf Variable }} & {\footnotesize Respondent's} & {\footnotesize Average} & {\footnotesize Number} & {\footnotesize Aggregate} & {\footnotesize Individual}  & \\
                & {\footnotesize Own}       & {\footnotesize Peer}    & {\footnotesize of}     & {\footnotesize Explanatory}  & {\footnotesize Explanatory} & \\
				        & {\footnotesize Choice}  & {\footnotesize Choice}  & {\footnotesize Peers}  & {\footnotesize Variables} & {\footnotesize Variables} & \\
 & ($y_{gi}$) & ($\bar{y}_{gi}$) & ($n_g-1$) & (${\bf z}_g$) & (${\bf x}_{gi}$) & \\
\hline
{\footnotesize {\bf Range }} & {\footnotesize $\{0,1\}$} & {\footnotesize $[0.0,1.0]$} & {\footnotesize $\{1,2,3,\ldots\}$} & {\footnotesize $(-\infty,\infty)$} & {\footnotesize $(-\infty,\infty)$} & \\
\hline
\end{tabular}
\end{center}
These variables are as described in the paper.


\vspace{0.1in}

\noindent{\bf Example}: The first three rows in the example data 
set {\tt examples/ex1\_data.txt} look like this:
{\scriptsize
\begin{verbatim}
1	0.25 4 0.28822476
0	1.00 4 0.084842203
0	0.75 4 0.16752506
\end{verbatim}
}
This data set features a single explanatory variable.  
For the first observation, we have $y_{gi}=1$, $\bar{y}_{gi}=0.25$, $(n_g-1)=4$ (so that $n_g=5$),
and $x_{gi}=0.28822476$.

\subsection{Parameter file and user options}

The parameter file for the {\tt smle} program is just a text file 
specifying your preferred settings for various user options.  

The program is very primitive in how it reads this file.  In 
particular it looks on specific lines of the file for specific 
variables.  For example, it will always set {\tt DATAFILE} to whatever is in the 
33rd line in the file.  The first four lines and any even numbered
lines are ignored, so they can be used for any comments one might want.
When creating your own parameter file, it is best to start with
an example parameter file from the {\tt examples} directory, and then
edit the file as needed for your application.
The contents of a properly-constructed parameter file are
listed below:

\begin{description}
\item[Line 5] ({\tt NVAR}): Number of exogenous explanatory variables (both group-level and individual-level)
	in data set (integer).
\item[Line 7] ({\tt NOBS}): Number of observations in data set (integer).
\item[Line 9] ({\tt NUMAGG}): The first {\tt NUMAGG} explanatory variables in the data set will be treated 
	as aggregate (i.e., group-level) variables (integer).  Group-level explanatory variables
	are treated differently in the model from individual-level explanatory variables.  See Section 4.4 in 
	the paper for details.  Note that the estimation method requires at least one individual-level 
	variable, so {\tt NUMAGG} must be strictly less than {\tt NVAR}.
\item[Line 11] ({\tt NSIM}): Number of simulations to use in calculating the log-likelihood function (integer).
	The program uses randomized Halton sequences, which accurately approximate probabilities using
	far fewer simulations than standard random numbers.  About {\tt NSIM=100} seems to work well enough 
	for a first pass.  One way to see if your value of {\tt NSIM} is big enough is to estimate the
	model several times and see if the parameter estimates change substantially.  If they do,
	then {\tt NSIM} is not big enough.
\item[Line 13] ({\tt RESTARTS}): Number of times to run the Davidson-Fletcher-Powell (DFP) search algorithm (integer).  
	At least {\tt RESTARTS=3} is recommended to avoid finding a local rather than global optimum.   
	If {\tt RESTARTS=0}, the simulated annealing (SA) search algorithm will be used instead of DFP.  Simulated
	annealing is usually much slower than DFP but is more robust in solving optimization problems
	with lots of local optima.
\item[Line 15] ({\tt SIMULATOR\_{}TYPE}): Type of simulator to use in calculating normal rectangle probabilities.
	Options include (program ignores all but the first letter, not case sensitive)
	\begin{itemize}
		\item {\tt G(HK)}: Geweke-Hajivassiliou-Keane simulator.  A slower but more accurate simulator that 
			is currently only available in combination  with {\tt EQUILIBRIUM\_{}TYPE=L}.
		\item {\tt H(ybrid)}: GHK-CFS hybrid simulator.  A faster and more flexible simulator, but generates
			a discontinuous approximate likelihood function.  It is recommended that the simulated annealing
			search algorithm be used if {\tt SIMULATOR\_{}TYPE = HYBRID}.
	\end{itemize}
	The GHK simulator and DFP search algorithm is the recommended combination, when available.
\item[Line 17] ({\tt EQUILIBRIUM\_{}TYPE}): Equilibrium selection rule assumed (character). See Sections
	2.3 and 4.2 in the paper for details.   Options include (program ignores all but 
	the first letter, not case sensitive):
	\begin{itemize}
		\item {\tt L(ow)}: Low-activity equilibrium
		\item {\tt H(igh)}: High-activity equilibrium
		\item {\tt R(andom)}: Random equilibrium
		\item {\tt B(ounds)}: Find selection-rule-free bounds (upper and lower) on $\gamma$ using 
			the likelihood bounds	method, as described in Section 4.2 of the paper.
		\item {\tt P(lot)}: Calculate selection-rule-free bounds on the likelihood function for 
			plotting (as in Figure 2 of the paper).
		\item {\tt M(inumum)}: Find selection-rule-free bounds (lower bound only) 
			on $\gamma$ using the likelihood bounds	method.
	\end{itemize}
\item[Line 19] ({\tt UNDERREPORTING\_{}CORRECTION}): Indicates whether or not to correct for underreporting (logical).  	See Section 4.3 in the paper for details.
\item[Line 21] ({\tt BOOTSTRAP}): Indicates whether to estimate the model once from the original sample 
	({\tt BOOTSTRAP = .FALSE.}) or 100 times from a series of bootstrap resamples ({\tt BOOTSTRAP = .TRUE.}).
\item[Line 23] ({\tt LOAD\_{}U}): Indicates whether to use the internal random number generator to produce 
	random numbers, or to load from the file specified as {\tt UFILE} (logical).  
\item[Line 25] ({\tt RHO\_{}TYPE}): Rule for treating the within-group correlation in unobservables $\rho_{\epsilon}$
	See Sections 2.4 and 4.1 in the paper for details. Options include:
	\begin{itemize}
		\item {\tt X} (recommended): Assume $\rho_{\epsilon}=\rho_x$.  This is the baseline
			restriction in the paper.
		\item {\tt F(ixed)}: Fix $\rho_{\epsilon}$ at the value of {\tt FIXED\_RHO} specified in line 27.
		\item {\tt E(stimate)}: Estimate $\rho_{\epsilon}$ directly (not recommended if {\tt FIX\_GAMMA=.true.}).
		\item {\tt I(nterval)}: Estimate $\hat{\gamma}(\rho_{\epsilon})$ function described in Section 4.1 of
			the paper.
	\end{itemize}
\item[Line 27] ({\tt FIXED\_RHO}): Value at which to fix $\rho_{\epsilon}$.  Ignored if {\tt RHO\_TYPE $\neq$ FIXED}.
\item[Line 29] ({\tt FIX\_GAMMA}): Indicates whether the value of $\gamma$ should be fixed rather than estimated (logical).
	See section 2.4 in the paper for details.
\item[Line 31] ({\tt FIXED\_GAMMA}): Value at which to fix $\gamma$ (real).  Ignored if {\tt FIX\_GAMMA = .false.}.
\item[Line 33] ({\tt DATAFILE}): Name of file from which data will be read.
\item[Line 35] ({\tt LOGFILE}): Name of file to which log data will be written.
	File will be overwritten.
\item[Line 37] ({\tt RESULTFILE}): Name of file to which estimation results will be written.  Results
	will be appended to file.
\item[Line 39] ({\tt UFILE}): Name of file in which random numbers are stored.  
	If {\tt LOAD\_U = .TRUE.}, then the random numbers will be read from this file.  If 
	{\tt LOAD\_U = .FALSE.}, then the random numbers will be written to this file.
\item[Line 41] ({\tt BOOTFILE}): Same as {\tt UFILE}, except the {\tt BOOTFILE} is where information on the bootstrap sample is stored/loaded.
\item[Line 43] ({\tt FIXEDEFFECTS}): Usually this should be zero, or it can be left out of the file entirely.
	Normally, the underreporting correction assumes a constant reporting rate for all respondents.
	It is possible to condition the estimated reporting rate on one or more of the aggregate
	explanatory variables: the program will estimate the reporting rate conditional
	on the first {\tt FIXEDEFFECTS} columns of explanatory variables in the data file.
\end{description}

\vspace{0.5in}
{\bf Example}: Take a look at the example parameter file 
{\tt examples/smle/example1.txt} provided in the 
distribution:

\hspace{1in}\begin{minipage}[c]{5in}
{\scriptsize
\begin{verbatim}
EXAMPLE1.TXT: SMLE example parameter file #1
This file shows a simple example, with the standard options
Note: data is in fixed format; do not delete lines
NVAR (number of variables, positive integer)
1
NOBS (number of observations, positive integer)
1000
NUMAGG (number of variables that are aggregate, nonnegative integer)
0
NSIM (number of simulations used to calculate likelihood function)
100
RESTARTS (number of times to restart the search algorithm)
3
SIMULATOR_TYPE (GHK or HYBRID)
GHK
EQUILIBRIUM_TYPE (LOW, HIGH, RANDOM, BOUNDS, PLOT, MINIMUM BOUNDS)
LOW
UNDERREPORTING_CORRECTION (correct for underreporting, logical)
.false.
BOOTSTRAP (calculate bootstrap covariance matrix, logical)
.false.
LOAD_U (load random numbers from file rather than generate them, logical)
.true.
RHO_TYPE (X, Fixed, Interval, or Estimate)
X
FIXED_RHO (Used if RHO_TYPE=FIXED, real)
0.0
FIX_GAMMA (logical)
.false.
FIXED_GAMMA (Used if FIX_GAMMA=.true., real)
0.0
DATAFILE (Name of data file)
ex1_data.txt
LOGFILE (Name of file to write log info)
ex1_log.txt
RESULTFILE (Name of file to write results to)
ex1_result.txt
UFILE (Name of file to write random numbers to, or read them from)
ex1_u.txt
BOOTFILE (Name of file to write bootstrap sample to, or read them from)
ex1_boot.dat
FIXEDEFFECTS (Number of aggregate variables to treat as fixed effects)
0

\end{verbatim}
}
\end{minipage}
 
This example shows a typical\footnote{One way that this implementation is not typical
is that it sets {\tt LOAD\_U}$=${\tt .TRUE.}, i.e. the program is asked to
construct the simulations based on the random numbers provided in the 
file {\tt ex1\_u.txt}. The usual setting would be {\tt LOAD\_U}$=${\tt .FALSE.}, 
i.e., the program should generate a new set of random numbers each time it runs.}
implementation with basic options.
The data are in a file named {\tt ex1\_data.txt}, and consist
of 1,000 observations ({\tt NOBS}$=1000$) with a single explanatory 
variable ({\tt NVAR}$= 1$) which is individual-level rather than group-level
({\tt NUMAGG}$=0$).  The likelihood function is to be estimated using 
the GHK simulator ({\tt SIMULATOR\_TYPE}$=${\tt GHK}) based on 100
simulations ({\tt NSIM}$=100$), and is to be maximized using the
DFP method with 3 restarts ({\tt RESTARTS}$=3$).  The model is to be estimated
from the original data and not a bootstrap resample of the data 
({\tt BOOTSTRAP}$=${\tt .FALSE.}) under the assumptions that 
individuals accurately report the behavior of their peers
({\tt UNDERREPORTING\_CORRECTION}$=${\tt .FALSE.}), that 
groups always play the ``low-activity'' equilibrium 
({\tt EQUILIBRIUM\_TYPE}$=${\tt Low}), and that the within-group
correlation in unobservables is equal to the within-group correlation 
in observables ({\tt RHO\_TYPE}$=${\tt X}).  Intermediate results 
are to be output to the log file {\tt ex1\_log.txt}, and 
final results are to be output to the results file
{\tt ex1\_result.txt}.


\subsection{Output files}\label{sec:output_smle}

The program is designed to run in the background, and does not write to standard output unless there is 
a problem.  Several files are written out to the work directory while the program runs.


\subsubsection{Log file}

As it runs, the program writes out detailed information on its operations to the file specified as 
{\tt LOGFILE} in the parameter file.  

\subsubsection{Results file (ordinary estimation)}

Estimation results are written\footnote{If the {\tt RESULTFILE} already exists, 
the results are appended to whatever is already in the file.}
to the file specified as {\tt RESULTFILE} in the parameter file.
The first number written to the file is the (maximized) log-likelihood.  After that come 
the estimates $(\hat{\rho}_x,\hat{\rho}_{\epsilon},\hat{\gamma},\hat{\beta})$.

{\bf Example:} If you execute the command {\tt smle example1.txt}
from the {\tt examples/smle/} directory, the results file 
might\footnote{On many systems the program will break long lines, 
so be sure to look directly at the text file before importing it 
into some other program.} look like this:
{\scriptsize
\begin{verbatim}
  -2269.71293038489 0.269466878031412 0.269466878031412 -6.618180087163724E-003 -2.755343936372363E-002 0.377144904928951     
\end{verbatim}}
The interpretation of these results is simple.  The maximized log-likelihood 
is approximately $-2269.71$.  The estimated correlation in observables 
is $\hat{\rho}_x \approx 0.269$.  The estimated correlation in unobservables
is also $\hat{\rho}_{\epsilon} \approx 0.269$ (note that $\hat{\rho}_x=\hat{\rho}_{\epsilon}$, 
as specified in the parameter file).  The estimated peer effect is 
$\hat{\gamma} \approx -6.618 \times 10^{-3}$ (essentially zero).  The estimated intercept
is $\hat{\beta}_0 \approx -2.755 \times 10^{-2}$ and the estimated coefficient on the
single explanatory variable is $\hat{\beta}_1 \approx  0.377$.

\subsubsection{Results file (bootstrap estimation)}

If the user option {\tt BOOTSTRAP} (line 21 of the parameter file) is set 
to {\tt .TRUE.}, the program will estimate the model on a (naive) bootstrap
sample of the original data, and then write the results to {\tt RESULTFILE}.
The program will repeat the process 100 times\footnote{If you wish to generate 
a larger bootstrap sample, just run the program again a few times.  Because the program appends 
the results to {\tt RESULTFILE} your new bootstrap results
will be appended to your old results.}, drawing a new bootstrap sample
each time.  This may take a very long time - if it takes 10 minutes
to estimate the model once, it will take 17 hours to estimate it 100 times.

{\bf Example:} If you execute the command {\tt smle example6.txt} the 
program will create a results file called {\tt ex6\_results.txt} that
looks something like this:
{\scriptsize
\begin{verbatim}
  -2.2697129518736533E+03   0.2689591776554643  -1.0613797086889261E-02  -2.7751454571931952E-02   0.3768691221571507
  -2.2617211025870947E+03   0.2707122386550440  -9.0122868167062196E-03  -8.1050563747404186E-03   0.3787628374547526
  etc.
\end{verbatim}}
Note that the program does not calculate or report bootstrap standard errors.  
Instead, you will need to read these bootstrap results into some standard statistical 
analysis package and calculate the covariance matrix from them.


\subsubsection{Results file (interval estimation)}

With the {\tt RHO\_{}TYPE = INTERVAL} option, the model is estimated under 
12 different specifications, and the results are reported for each of these 
specifications, in the following order:
\begin{description}
	\item[Line 1] Standard assumption, $\rho_{\epsilon}=\rho_x$, $\gamma$ to be estimated.
	\item[Line 2] $\gamma=0$, $\rho_{\epsilon}$ to be estimated.
	\item[Line 3] $\rho_{\epsilon}= 0.0$, $\gamma$ to be estimated.
	\item[Line 4] $\rho_{\epsilon}= 0.1$, $\gamma$ to be estimated.
	\item $\vdots$
	\item[Line 12] $\rho_{\epsilon}= 0.9$, $\gamma$ to be estimated.
\end{description}


\subsubsection{Results file (likelihood bounds estimation)}

Output for selection-rule-free estimation using the likelihood bounds
approach is also different from the standard case.

If {\tt EQUILIBRIUM\_{}TYPE = BOUNDS}, the program writes out the estimated
lower bound for $\gamma$, then the estimated upper bound.  Bounds are not calculated
or reported for the other parameters.

If {\tt EQUILIBRIUM\_{}TYPE = MINIMUM}, the program writes out only the estimated
lower bound for $\gamma$.

If {\tt EQUILIBRIUM\_{}TYPE = PLOT}, the program calculates the approximate
upper bound $H_g$ and lower bound $L_g$ on the log-likelihood function for $\gamma \in \{0.0,0.1,\ldots,4.0\}$
and writes $(\gamma,H_g,L_g)$. to the file.  This can be used to construct a plot
like that seen in Figure 2 of the paper.


\subsubsection{Checkpoint files}

Because the program can potentially run for a very long time before producing the final
data, ``checkpointing'' has been implemented.  

Periodically while running the program saves a binary representation of its 
current state to the file {\tt check.dat} and also a blank file called {\tt check.lock}. 
These two files are both deleted on successful completion of the program.
{\tt check.lock} functions as a simple locking file - if you try to run the program in a directory
that has a {\tt check.lock} file, it will stop itself before doing much of anything.  
This is to avoid potential conflicts from having multiple instances of the program 
running in the same directory simultaneously.

If the program is interrupted, it can usually be restarted from the last checkpoint.  To do this,
one needs only to delete the file {\tt check.lock}, and run the program as normally.  
The program will automatically search the work directory for the {\tt check.dat} file and load 
it.  


\section{Using the {\tt s2} program}\label{sec:s2}

The {\tt s2} program can be used to estimate the structural model from a group-based sample.

\subsection{Data file}

The program expects data in whitespace-delimited ASCII format, with
no headers.  Each row corresponds to an observation of an individual.
The columns are as follows:
\begin{center}
\begin{tabular}{|c|c|c|c|}
\hline
{\footnotesize {\bf Column \# }} & {\footnotesize \bf 1} & {\footnotesize 2} & {\footnotesize 3+} \\ 
\hline
{\footnotesize {\bf Variable }} & {\footnotesize Group } & {\footnotesize Respondent's} & {\footnotesize Explanatory} \\
                                & {\footnotesize ID Number}     & {\footnotesize Choice      } & {\footnotesize Variables} \\
\hline
{\footnotesize {\bf Range }} & {\footnotesize $\{0,1,2,\ldots\}$} & {\footnotesize $\{0,1\}$} & {\footnotesize $(-\infty,\infty)$} \\
\hline
\end{tabular}
\end{center}
If you intend to treat some of the explanatory variables as aggregates, put them 
before the other explanatory variables.

For example, with 2 explanatory variables and 5 observations the 
file might look like this:
{\scriptsize
\begin{verbatim}
1 1 -0.337  0.000
1 0 -1.734  1.000
1 0  0.532  0.000
2 1  0.536  1.000
2 1 -1.234  1.000
\end{verbatim}
}

\subsection{Parameter file and user options}

The parameter file for the {\tt s2} program is named {\tt parm.dat}.
and is just a text file with a set of user options specified.  It looks like this:

{\scriptsize \begin{verbatim}
Parameter file: Data is in fixed format; do not delete lines
DATAFILE: name of file where data is located
allobs.txt
RESULTFILE: name of file to which results should be appended
smle.out
LOGFILE: name of file to send logging information
lfile.log
NOBS: number of observations
1000
NVAR: number of exogenous explanatory variables
1
NUMAGG: number of exogenous explanatory variables that are aggregates
0
NSIM: number of simulations to use in calculating estimated loglikelihood
100
SEARCH_METHOD: DFP (Davidson-Fletcher-Powell) or SA (Simulated Annealing)
sa
RESTARTS: number of times to run search algorithm
2
EQUILIBRIUM_TYPE: equilibrium selection rule, either low, random, or high
Low
COVMAT_TYPE: method for calculating covariance matrix; either Hessian, OPG, or None
None
RHO_TYPE: 
X
FIXED_RHO: value to fix rho_e at if FIX_RHO=.true.
0.7000
FIX_GAMMA: normally gamma is estimated, but it is fixed if this is .true.
.false.
FIXED_GAMMA: value to fix gamma at if FIX_GAMMA=.true.
0.0000
LOAD_U: .true. if you want random numbers loaded from UFILE, .false. if you want new random numbers
.false.
UFILE: name of file to which random numbers should be written (if LOAD_U=.false.) or read (if LOAD_U=.true.)
testu.dat
\end{verbatim}}

The description of each line in this file is as follows:
\begin{enumerate}
\item {\tt DATAFILE}: Name of file from which data will be read (up to 12 characters, case sensitive).
\item {\tt RESULTFILE}: Name of file to which estimation results will be written (up to 12 characters, case sensitive).  Results are appended to this file.
\item {\tt LOGFILE}: Name of file to which log data will be written (up to 12 characters, case sensitive).
	File will be overwritten.
\item {\tt NOBS}: Number of observations in data set (integer).
\item {\tt NVAR}: Number of exogenous explanatory variables in data set (integer).
\item {\tt NUMAGG}: The first {\tt NUMAGG} explanatory variables in the data set will be treated as aggregates (integer).  See Section 4.4 in the paper for details.
\item {\tt NSIM}: Number of simulations to use in calculating the log-likelihood function (integer).
\item {\tt SEARCH\_METHOD}: Optimization method to use.  Options are:
	\begin{itemize}
		\item {\tt S(imulated Annealing)} (recommended): Use the simulated annealing search algorithm.
		\item {\tt D(avidson-Fletcher-Powell)}: Use the DFP search algorithm
	\end{itemize}
	Because the GHK-CFS hybrid simulator used in this program produces a discontinuous approximation
	to the log-likelihood function, the simulated annealing algorithm is recommended.
\item {\tt RESTARTS}: Number of times to restart the DFP search algorithm, if applicable (integer).  
\item {\tt EQUILIBRIUM\_{}TYPE}: Equilibrium selection rule assumed (character). 
	See Sections 2.3 and 4.2 in the paper for details.   Options include (program ignores all but 
	the first letter, not case sensitive):
	\begin{itemize}
		\item {\tt L(ow)}: Low-activity equilibrium.
		\item {\tt H(igh)}: High-activity equilibrium.
		\item {\tt R(andom)}: Randomly selected equilibrium.
		\item {\tt B(ounds)}: Find selection-rule-free bounds on $\gamma$ using the likelihood bounds
			method.
		\item {\tt P(lot)}: Calculate selection-rule-free bounds on the likelihood function for 
			plotting.
		\item {\tt M(inumum)}: Find selection-rule-free bounds (lower bound only) 
			on $\gamma$ using the likelihood bounds	method.
	\end{itemize}
\item {\tt COVMAT\_TYPE}: Method for estimating covariance matrix of parameter estimates.
	\begin{itemize}
		\item {\tt O(PG)}: Use outer product of gradients/BHHH method.
		\item {\tt H(essian)}: Use inverse Hessian method.  Although this method is available,
			it is not recommended as the small discontinuities in the approximated likelihood function 
			lead to poor approximation of the Hessian.
		\item {\tt N(one)}: Don't estimate covariance matrix.
	\end{itemize}
\item {\tt RHO\_{}TYPE}: Rule for treating the within-group correlation in unobservables $\rho_{\epsilon}$
	See Sections 2.4 and 4.1 in the paper for details. Options include:
	\begin{itemize}
		\item {\tt X} (recommended): Assume $\rho_{\epsilon}=\rho_x$.  This is the baseline
			identifying assumption discussed in the paper.
		\item {\tt F(ixed)}: Fix $\rho_{\epsilon}$ at the value of {\tt FIXED\_RHO} specified below.
		\item {\tt E(stimate)}: Estimate $\rho_{\epsilon}$ directly.  This is not recommended if {\tt FIX\_GAMMA=.false.},
			because $\rho_{\epsilon}$ is very weakly identified in this case.
		\item {\tt I(nterval)}: Estimate $\hat{\gamma}(\rho_{\epsilon}$ function described in paper.
	\end{itemize}
\item {\tt FIXED\_RHO}: Value at which to fix $\rho_{\epsilon}$ (real).  Ignored if {\tt RHO\_TYPE $\neq$ FIXED}.
\item {\tt FIX\_GAMMA}: Indicates whether the value of $\gamma$ should be fixed rather than estimated (logical).
\item {\tt FIXED\_GAMMA}: Value at which to fix $\gamma$ (real).  Ignored if {\tt FIX\_GAMMA = .false.}.
\item {\tt LOAD\_{}U}: Indicates whether to use the internal random number generator to produce 
	random numbers, or to load from a user-specified file (logical).  
\item {\tt UFILE}: Name of file in which random numbers are stored (up to 12 characters, case sensitive).  
	If {\tt LOAD\_U = .TRUE.}, then the random numbers will be read from this file.  If 
	{\tt LOAD\_U = .FALSE.}, then the random numbers will be written to this file.
\end{enumerate}


\subsection{Output files}

The output, logging, and checkpoint files for the {\tt s2} program
take almost the same form as described in Section~\ref{sec:output_smle} for the {\tt smle} program.
The only difference is that if {\tt COVMAT=OPG} or {\tt COVMAT=HESSIAN}, 
the estimated covariance matrix is reported as well. 

\section{Using the {\tt probit} program}

The {\tt probit} program estimates a standard (naive) probit model, treating
peer behavior as exogenous.

\subsection{Data format}

The program expects data in the same format as the {\tt smle} program
if data are from an individual-based sample, and in the same format 
as the {\tt s2} program if data are from a group-based sample.

\subsection{Parameter file and user options}

The parameter file for the {\tt probit} program is named {\tt parm.dat}.
and is just a text file with a set of user options specified.  It looks like this:
{\scriptsize
\begin{verbatim}
Parameter file - data is in fixed format; do not delete lines
NVAR
1
NOBS
1000
SAMPLE_TYPE
Individual
UNDERREPORTING_CORRECTION
.false.
DATAFILE
oneobs.txt
LOGFILE
probit.log
RESULTFILE
probit.out
\end{verbatim}
}

The description of each line in this file is as follows:
\begin{enumerate}
\item {\tt NVAR}: Number of explanatory variables in data set (integer).
\item {\tt NOBS}: Number of observations in data set (integer).
\item {\tt SAMPLE\_TYPE}: Format of input data.  Options include:
	\begin{itemize}
		\item {\tt I(ndividual)}: Individual-based sample, as described in Section~\ref{sec:smle}.
		\item {\tt G(roup)}: Group-based sample, as described in Section~\ref{sec:s2}.  Not yet implemented.
	\end{itemize}
\item {\tt UNDERREPORTING\_{}CORRECTION}: Indicates whether or not to correct for underreporting (logical).
\item {\tt DATAFILE}: Name of file from which data will be read (up to 12 characters, case sensitive).
\item {\tt LOGFILE}: Name of file to which log data will be written (up to 12 characters, case sensitive).
\item {\tt RESULTFILE}:Name of file to which estimation results will be written (up to 12 characters, case sensitive).  	Results	are appended to this file.
\end{enumerate}

\subsection{Output files}

Unlike {\tt smle} and {\tt s2}, the {\tt probit} program only takes a second or two to 
run.  As a result, there is no checkpointing.

The results of estimation are written to the file specified as {\tt RESULTFILE}.
The first number reported is the log-likelihood function, the second is the intercept,
the third is the coefficient on peer behavior, and the remainder are coefficients
on the other explanatory variables.


\section{Using the {\tt psim} program}\label{sec:psim}

The {\tt psim} program generates simulated data from the model.  It is useful
in constructing Monte Carlo experiments.

\subsection{Parameter file and user options}

The parameter file for the {\tt psim} program is named {\tt parmonte.dat}.
and is just a text file with a set of user options specified.  It looks like this:
{\tiny \begin{verbatim}
Parameter file: Data is in fixed format; do not delete lines
NGROUP: Number of groups to simulate
1000
MAXGROUPSIZE: maximum size of groups (right now all are same size)
5
NVAR: number of exogenous explanatory variables
1
NUMAGG: number of explanatory variables that are aggregate
0
EQTYPE: equilibrium type (low, high, or random)
Low
XTYPE: x type (Binary, or Normal)
N
B: coefficient vector, must be length NVAR+5
0.25 0.25 0.0 0.5 0.0 1.0
REPORTING_RATE
1.0
b2
0.0 0.00
\end{verbatim}}

The description of each line in this file is as follows:
\begin{enumerate}
\item {\tt NGROUP}: Number of groups to simulate (integer).
\item {\tt MAXGROUPSIZE}: Number of individuals per group (integer).
\item {\tt NVAR}: Number of explanatory ($x$) variables (integer).
\item {\tt NUMAGG}: The first {\tt NUMAGG} explanatory variables will be
	treated as aggregates.
\item {\tt EQTYPE}: Equilibrium selection rule.  Options are:
	\begin{itemize}
		\item {\tt L(ow)}: Low-activity equilibrium.
		\item {\tt H(igh)}: High-activity equilibrium.
		\item {\tt R(andom)}: Randomly selected equilibrium.
	\end{itemize}
\item {\tt XTYPE}: Allows for there to be non-normal explanatory variables.  Options are:
	\begin{itemize}
		\item {\tt N(ormal)}: $x$ is normally distributed (the usual assumption).
		\item {\tt B(inary)}: $x$ is binary.  
	\end{itemize}
\item {\tt B}: Coefficient vector, length {\tt NVAR+5}.  Order of elements is
	$(\rho_x,\rho_{\epsilon},\gamma,\delta,\beta_0,\beta_1,\ldots,\beta_{\tt NVAR})$.
	Note: $\delta$ is the contextual effect, if there is one.
\item {\tt REPORTING\_RATE}: This is a sequence of four real numbers,
	used to model inconsistent reporting.  Let $r_{i,j}$ be the choice of 
	person $i$ as reported by person $j$. The four numbers are, in order, 
	$\Pr(r_{i,i}=1|y_i=0)$, $\Pr(r_{i,i}=1|y_i=1)$, $\Pr(r_{i,j}=1|y_i=0)$,
	and $\Pr(r_{i,j}=1|y_i=1))$.
	For example ``{\tt 0.0 1.0 0.0 1.0}'' will describe the base case of
	truthful reporting.
\item {\tt B2}: This row should have two floating-point numbers.  The first is the 
	value of $corr(\beta {\bf x}_{gi},\epsilon_{gi})$ and the second is the 
	value of $corr(\beta {\bf x}_{gi},\epsilon_{gj})$.  Usually this will just be 
	``{\tt 0.0 0.0}''.
\end{enumerate}

\subsection{Output}

The {\tt psim} program will output two files, a data set that mimics an individual-based
sample named {\tt oneobs.txt} and a data set that mimics a group-based sample named
{\tt allobs.txt}.  The files are ready to be used by {\tt smle} and {\tt s2} respectively.

\appendix

\section{Compiling}\label{sec:compiling}

Compiling is a matter of following these steps:
\begin{enumerate}
\item Procure a Fortran 90 compiler. A good reference for Fortran 90 is 
	Metcalf and Reid's {\it Fortran 90/95 Explained}.  The programs have been 
	successfully compiled on Linux using the Portland Group PGF90 and 
	PGHPF compilers, as well as the Intel IFC compiler.  They have been successfully
	compiled on Windows using the free F compiler (The F language is a subset of 
	Fortran) provided by The Fortran Group ({\tt http://www.fortran.com}).  
	The Fortran Group also provides free F compilers for Linux and other
	operating systems.
\item Unzip the {\tt smle.zip} file into some appropriate directory.
	There will be one sub-directory for each major program, as well as 
	library ({\tt lib}) and documentation ({\tt doc}) subdirectories
\item System-specific code is in a file named {\tt lib/bklib.f90}.
  For example, this program includes subroutines that access the 
  compiler's default random number generator.  If you want to 
  use a better random number generator (for example, from the NAG libaries)
  this file can be modified to do so.
\item To compile the {\tt smle} program, for example, 
	edit the batch file {\tt compile\_{}smle} (for Linux) or {\tt compile\_smle.bat} (for Windows):
	\begin{itemize}
		\item Replace the call to ``{\tt f90}'' or ``{\tt F}'' with the name of your Fortran 90 compiler
		\item If you created your own {\tt bklib} file, replace the 
			reference to the filename ``{\tt ../lib/bklib.f90}'' with the name of your version.
		\item Adjust any of the compiler options as appropriate.
	\end{itemize}
\item Run the batch file you just edited.  If compilation is successful, there should be
	a new executable file called {\tt smle} (if Linux) or {\tt smle.exe} (if Windows).
\item Repeat for the other programs.
\end{enumerate}

\section{Version history}

\begin{itemize}
	\item Version 1.1 - March 1, 2005. First publicly released version.
	\item Version 1.1.1 - March 16, 2020.
		\begin{itemize}
			\item Windows binaries are now compiled on Intel Fortran compiler, leading to 
				a 43\% reduction in program time for SMLE.EXE and a 16\% reduction
				for S2.EXE.
			\item Program can now be compiled on IBM XL Fortran compiler.  This involved 3 changes to code:
				\begin{itemize}
					\item Workaround inserted to deal with compiler bug related to use of
						MATMUL command with calculated indices.
					\item Code that reads in data from text files now strips CR and LF characters from
						end of each line.
					\item Stack overflow error repaired.  
				\end{itemize}
			\item Program (windows binary version) can now take user-supplied name of parameter file.  
					Previous version only allowed the parameter file to be named ``parm.dat''.  
		\end{itemize}
	\item Version 1.2 - March 23, 2020
		\begin{itemize}
			\item Added Stata wrapper for both SMLE.EXE and S2.EXE.
			\item Bug fixes
		\end{itemize}
\end{itemize}


\begin{thebibliography}{breitestes Label}
\bibitem{smle}
Krauth, Brian V., 2006.  ``Simulation-based estimation of peer effects,'' 
{\it Journal of Econometrics} 133(1): 243-271. 
	
\end{thebibliography}

\end{document}